Analyzing Roles of Classifiers and Code-Mixed factors for Sentiment Identification

نویسندگان

  • Soumil Mandal
  • Dipankar Das
چکیده

Multilingual speakers often switch between languages to express themselves on social communication platforms. Sometimes, the original script of the language is preserved, while using a common script for all the languages is quite popular as well due to convenience. On such occasions, multiple languages are being mixed with different rules of grammar, using the same script which makes it a challenging task for natural language processing even in case of accurate sentiment identification. In this paper, we report results of various experiments carried out on movie reviews dataset having this code-mixing property of two languages like English and Bengali, both typed in Roman script. We have tested various machine learning algorithms trained only on English features on our code-mixed data and have achieved a maximum accuracy of 59.00% using a Naïve Bayes (NB) model. We have also tested various models trained on code-mixed data, as well as English features and the highest accuracy of 72.50% was obtained using a Support Vector Machine (SVM) model. Finally, we have analyzed the misclassified snippets and have discussed the challenges needed to be resolved for better accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language-Independent Twitter Sentiment Analysis

Millions of tweets posted daily contain opinions and sentiment of users in a variety of languages. Sentiment classification can benefit companies by providing data for analyzing customer feedback for products or conducting market research. Sentiment classifiers need to be able to handle tweets in multiple languages to cover a larger portion of the available tweets. Traditional classifiers are h...

متن کامل

Sentiment Identification in Code-Mixed Social Media Text

Sentiment analysis is the Natural Language Processing (NLP) task dealing with the detection and classification of sentiments in texts. While some tasks deal with identifying presence of sentiment in text (Subjectivity analysis), other tasks aim at determining the polarity of the text categorizing them as positive, negative and neutral. Whenever there is presence of sentiment in text, it has a s...

متن کامل

Sentiment Analysis of Code-Mixed Languages leveraging Resource Rich Languages

Code-mixed data is an important challenge of natural language processing because its characteristics completely vary from the traditional structures of standard languages. In this paper, we propose a novel approach called Sentiment Analysis of Code-Mixed Text (SACMT) to classify sentences into their corresponding sentiment positive, negative or neutral, using contrastive learning. We utilize th...

متن کامل

Profiling Student Interactions in Threaded Discussions with Speech Act Classifiers

On-line discussion is a popular form of web-based computer-mediated communication and is an important medium for distance education. Automatic tools for analyzing online discussions are highly desirable for better information management and assistance. This paper presents an approach for automatically profiling student interactions in on-line discussions. Using N-gram features and linear SVM, w...

متن کامل

Sentiment Analysis of Social Networking Data Using Categorized Dictionary

Sentiment analysis is the process of analyzing a person’s perception or belief about a particular subject matter. However, finding correct opinion or interest from multi-facet sentiment data is a tedious task. In this paper, a method to improve the sentiment accuracy by utilizing the concept of categorized dictionary for sentiment classification and analysis is proposed.  A categorized dictiona...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.02581  شماره 

صفحات  -

تاریخ انتشار 2017